After looking through the data, I decided to use 3 different metrics to try and find low paying and high performing players. I believe that the most important traits of a player include the number of points they score (PTS), the number of blocks they can execute (BLK) and their effective field goal percentage (eFG.). This is because points will help the team win the game, while blocks will prevent the enemy team from scoring too much. eFG. is there to help ensure that our players have a certain level of accuracy, which is desired among basketball players.

I first started by categorizing players based on their salary. I wanted to create a categorical variable that indicated whether the player was in the “low pay”, “middle pay” or “high pay” range of salary. I decided to use 3 clusters for the salary after much experimentation. The results of my experimentation are in the bar graph below, which shows that there is a decent amount of players in each category.

With the players categorized by their salary, I tried to discover how many clusters would be needed to categorize the performance variables (PTS, BLK and .eFG). I used this function to try and find how much the clustering explained the variance.

## Evaluate several different number of clusters
explained_variance = function(data_in, k){
  ### Running the kmeans algorithm.
  set.seed(1)
  kmeans_obj = kmeans(data_in, centers = k, algorithm = "Lloyd", iter.max = 30)
  
  ### Variance accounted for by clusters:
  ### var_exp = intercluster variance / total variance
  var_exp = kmeans_obj$betweenss / kmeans_obj$totss
  var_exp  
}

I plotted the variance explained out to use the elbow method, it looks like 3 clusters is a good choice.

I them proceeded to create three clusters for performance and overlay the pay range clusters (low, middle and high). I got some promising results below. Here, we can see that there are some players in the “low pay” range who perform quite well in certain areas. The withinSS I got was 88.9%.

However, these 2D graphs will not be enough for us to find good players. A 3D graph will allow us to see the clusters in their entirety. This interactive model is shown below.

As we can see, there are several good candidates for selection.

Nicholas Batum is a good choice because he has a high eFG. with low pay. Jamal Murray is a good choice because he scores lots of points with low pay. In addition, his eFG. and BLK are not too bad (greater than 0).

Bradley Beal is a good choice because he has a lot of blocks and he scores decently (eFG. > 0 and PTS > 0.75).

Overall, I beleive these players are under paid but highly performant. I highly recommend these players for our team.